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Abstract. 

A deflated restarted Lanczos algorithm is given for both solving symmetric linear equations and 
computing eigenvalues and eigenvectors. The restarting limits the storage so that finding eigenvectors 
is practical. Meanwhile, the deflating from the presence of the eigenvectors allows the linear equations 
to generally have good convergence in spite of the restarting. Some reorthogonalization is necessary to 
control roundoff error, and several approaches are discussed. The eigenvectors generated while solving 
the linear equations can be used to help solve systems with multiple right-hand sides. Experiments 
are given with large matrices from quantum chromodynamics that have many right-hand sides. 
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1. Introduction. We consider a large matrix A that is either real symmetric 
or complex Hermitian. We are interested in solving the system of equations Ax = 
&, possibly with multiple right-hand sides, and in solving the associated eigenvalue 
problem. Both eigenvalues and eigenvectors are desired. Symmetric and Hermitian 
problems can take advantage of fast algorithms such as the conjugate gradient method 
(CG) [52] for linear equations and the related Lanczos algorithm HB] for 
eigenvalues. However, regular CG can be improved upon for the case of multiple 
right-hand sides, and Lanczos may have storage and accuracy issues while computing 
eigenvectors. We give new methods for these problems. 

An approach is presented called Lanczos with deflated restarting or Lan-DR. It 
simultaneously solves the linear equations and computes the eigenvalues and eigen- 
vectors. A restarted Krylov subspace approach is used for the linear equations, but 
it also saves approximate eigenvectors at the restart and uses them in the subspace 
for the next cycle. The restarting of the Lanczos algorithm does not slow down the 
convergence as it normally would, because once the approximate eigenvectors are ac- 
curate enough, they essentially remove or deflate the associated eigenvalues from the 
problem. The eigenvalue portion of Lan-DR has already been presented in 68J, but 
as mentioned, we add on the solution of linear equations. Also, some reorthogonal- 
ization is necessary to control roundoff error. We give some new approaches for this 
reorthogonalization. We also give a Minres/harmonic version of the algorithm. 
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An important example where both linear equations need to be solved and eigen- 
vectors are desired is the case of multiple right-hand sides. The eigenvector informa- 
tion generated for one right-hand side can be used to improve the convergence of the 
systems with subsequent right-hand sides. We give a method called deflated conjugate 
gradients or D-CG that implements such an approach. For the second and subsequent 
right-hand sides, there is first a projection over the approximate eigenvectors gener- 
ated by Lan-DR on the first right-hand side. Then the conjugate gradient iteration 
is applied. We will give application to large Hermitian systems of linear equations in 
quantum chromodynamics (QCD). 

Section 2 has a review of methods that this paper builds on. The Lan-DR method 
for eigenvalues and linear equations is presented in Section 3. Section 4 has compar- 
ison of reorthogonalization approaches. Multiple right-hand sides are considered in 
Section 5. Then Section 6 has Minres-DR and deflated Minres, which are of interest 
particularly for indefinite systems and interior eigenvalues. 

2. Review. 

2.1. Restarted methods for eigenvalue problems. Restarted Krylov meth- 
ods for nonsymmetric eigenvalue problems took a jump forward with the implicitly 
restarted Arnoldi method (IRAM) by Sorensen 63 . IRAM restarts with several 
vectors, so it not only computes many eigenvectors simultaneously, but it also has im- 
proved convergence. At the time of a restart, let the Ritz vectors be {j/i, ?/2, ■ • • , Dk] 
and let r be a multiple of the residual vectors for these Ritz vectors (the residuals are 
parallel) . Then the next cycle of IRAM builds the subspace 

Span{y u y 2 , . . . y k , r, Ar, A 2 r : A 3 r : . . . , A^^r}. (2.1) 

This subspace is equivalent [29] to 

Span{yi,y 2 , ■ . ■ y k ,A Vl , A 2 Vl , A 3 y t , . . . , A m - k Vi \. (2.2) 

This last form helps show why IRAM is effective. The subspace contains a Krylov 
subspace with each approximate eigenvector as starting vector. 

Versions of IRAM for symmetric problems are given in J6[ [4] . A mathematically 
equivalent method to IRAM that does not use the implicit restarting is in [29] (this 
approach is equivalent at the end of each cycle). Wu and Simon present a mathe- 
matically equivalent approach for symmetric problems called thick restarted Lanczos 
(TRLAN) [5S]. They put the approximate eigenvectors at the beginning of a new 
subspace instead of at the end as was done in [29]. Stewart gives a framework for 
restarted Krylov methods in [65] . See [54] for a symmetric block version. Nonsym- 
metric and harmonic versions of restarted Arnoldi following the TRLAN approach 
are in [35]. For block Arnoldi methods, see [2] and its references. 

2.2. Deflated restarted methods for linear equations. Deflated Krylov 
methods for linear equations compute eigenvectors and use them to deflate eigenvalues 
and thus improve convergence of the linear equations solution. For problems that have 
slow convergence due to small eigenvalues, deflation can make a big difference. For 
nonsymmetric problems, approximate eigenvectors are formed during GMRES [55] 
in[2i[2l[T3[S[53[l[5l[a[12l[30l|3T]. Methods in [12] are related in that they save 
information during restarted GMRES in order to improve convergence. 

Now for symmetric problems, it is assumed in [38 that there is a way to get 
approximate eigenvectors which are used for a deflated CG. Eigenvectors are formed 
during CG in [56] . 
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We now look at a particular deflated GMRES method that will be used in this 
paper. GMRES-DR(m,k) [3T] uses the subspace 

Span{yi, y 2 , . . . y k , r , Ar Q ,A 2 r , A 3 r , . . . , A m - k - l r }, (2.3) 

where ro is the initial residual for the linear equations at the start of the new cycle, 
{yi, i/2, ■ ■ • Vk\ are the harmonic Ritz vectors corresponding to the smallest harmonic 
Ritz values. The dimension of the whole subspace is m, including the k approximate 
eigenvectors (the harmonic Ritz vectors). This can be viewed as a Krylov subspace 
generated with starting vector r augmented with approximate eigenvectors. Remark- 
ably, the whole subspace turns out to be a Krylov subspace itself (though not with 
ro as starting vector) [30]. FOM-DR [31] is a version that, as in FOM [49], uses 
a Galerkin projection instead of minimum residual projection. FOM-DR also needs 
regular Ritz vectors instead of harmonic. 

2.3. Multiple right-hand sides. Systems with multiple right-hand sides occur 
in many applications (see [18] for some examples) . Block methods are a standard way 
to solve systems with multiple right-hand sides (see for example [4U1 [521 IT51 I3"2l |2T| ). 
However, block methods are not ideal for every circumstance. Other approaches for 
multiple right-hand sides use information from the solution of the first right-hand 
side (and possibly others) to assist subsequent right-hand sides. Seed methods [521 
[H [24j [45j [5j] [67] [16] project over entire subspaces generated while solving previous 
right-hand sides. Simoncini and Gallopoulos [60] [61] suggest methods including using 
seeding, using blocks and using Richardson iteration with a polynomial generated from 
GMRES applied to the first right-hand side. In [33] a small subspace is generated with 
GMRES-DR applied to the first right-hand side that contains important information 
of approximate eigenvectors, and this is used to improve the subsequent right-hand 
sides. See [44] for a method for multiple right-hand sides that can also handle a 
changing matrix. 

3. Lanczos with deflated restarting. We propose a restarted Lanczos method 
that both solves linear equations and computes eigenvalues and eigenvectors. It is 
called Lanczos with deflated restarting or Lan-DR. The Lan-DR method is a ver- 
sion of FOM-DR [31] for symmetric and Hcrmitian problems and is closely related 
to GMRES-DR [31]. As mentioned earlier, the eigenvalue portion of Lan-DR is TR- 
LAN [68] and is mathematically equivalent to implicitly restarted Arnoldi [63] . 

For Lan-DR, the number of desired eigenvectors k must be chosen, along with 
which eigenvalues are to be targeted. Normally the eigenvalues nearest the origin are 
the most important ones for deflation purposes, but other eigenpairs can be computed. 
In particular, deflating large outstanding eigenvalues may help convergence of the 
linear equations solution and may be useful for stability. 

At the time of a restart, let ro be the residual vector for the linear equations and 
let the Ritz vectors from the previous cycle be {j/i, y 2 , ■ ■ ■ ,yk}- Then the next cycle 
of Lan-DR builds the subspace 

Span{yi, y%,... y k , r , Ar , A 2 r , A 3 r . . . , A m - fc_1 To}. (3.1) 

Lan-DR generates the recurrence formula 

AV m = V m+1 T m , (3.2) 

where V m is an n by m matrix whose columns span the subspace (|3.1[) and V m +i 
is the same except for an extra column. Also T m is an m + 1 by m matrix that is 
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tridiagonal except for the k + 1 by k + 1 leading portion. This portion is non-zero only 
on the main diagonal (which has the Ritz values) and in the k + 1 row and column. 
A part of recurrence (|3.2p can be separated out to give 

AV k = V k+1 T k , (3.3) 

where V k is an n by k matrix whose columns span the subspace of Ritz vectors, Vfc+i 
is the same except for an extra column and T k is the leading k+ 1 by k portion of T m . 
This recurrence allows access to both the approximate eigenvectors (the Ritz vectors) 
and their products with A while requiring storage of only k + 1 vectors of length n. 
The approximate eigenvectors in Lan-DR span a small Krylov subspace of dimension 
k [Si- 
It is necessary to maintain some degree of orthogonality of the columns of V m +i- 
We mention here an approach to reorthogonalization that we call k-selective reorthog- 
onalization (k-SO). For cycles after the first one, all new Lanczos vectors are reorthog- 
onalized against the k Ritz vectors. This uses Parlett and Scott's idea of selective 
reorthogonalization [17], but is more natural in this setting, because we are already 
computing the approximate eigenvectors. Also, because of the restarting, there is no 
need to store a large subspace. Simon's partial reorthogonalization [SDJ IEE] is also 
a possibility, as is periodic reorthogonalization [2D]. See Section 4 for more on this, 
including hybrid approaches. Full reorthogonalization or partial reorthogonalization 
can be used for the first cycle, but this may not be necessary. By Paige's theo- 
rem [41] [46j [66] , orthogonality is only lost when some eigenvectors begin to converge. 
This may not happen in the first cycle if there are no outstanding eigenvalues. In our 
tests, we do reorthogonalize for the first cycle. 

Lan-DR(m,k) 

1. Start. Choose m, the maximum size of the subspace, and k, the desired 
number of approximate eigenvectors. If there is an initial guess, xq, then the 
problem becomes A(x — xq) = Vq. 

2. First cycle. Apply m iterations of the standard symmetric Lanczos algorithm. 
This computes the matrix V m +i that has the Lanczos vectors as columns and 
the m + 1 by m tridiagonal matrix T m . In addition, fully reorthogonalize all 
the Lanczos vectors. 

3. Eigenvector computation. Compute the k smallest (or others, if desired) 
eigenpairs, (6i,gi), of T m , the m by m portion of T m . For 7 = 1, k, form 
Ui = V m gi- The shortcut residual norm formula is T{ — \tm+l,m9m,i \ ■ 

4. Linear equations. Let c m — V k rg (for the first cycle, this is all zeros except for 
1 1 ro|| in the first position, for other cycles it is all zeros except for ||ro|| in the 
k + 1 position). Solve T m d — c, and set x = xq + V m d. Then r = ro — Ax 

r, — V m +\T m d. If satisfied with convergence of the linear equations and the 
eigenvalues, can stop. If not, let the new xo — x and ro — r and continue. 

5. Restart. For 7 = 1, k, reassign V{ = y{. Set v k +i to be the previous v m +l- 
Set the k by k portion of T k to be the diagonal matrix with the Ritz values 
8i as diagonal entries. For i = 1, k„ the ith element fcth row of T k is 
computed by t m +i, m g m ,i. Set T k =T k . 

6. Main iteration. First we compute the v k +2 vector. Compute w = Av k +i — 
Z)i=i tk+i,iVi, then tm+i, m +i = v k+1 w and w = w - t m+ltm+1 v k +i. Let 
tm+i,m+i = IMI- S e t, v k+ 2 = w/\\w\\. Next, compute the other u, vectors, 
for i — k + 3, . . . , 777 + 1, using the standard Lanczos iteration. T m is formed 
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at the same time. Also reorthogonalize the vectors as desired (see next 
section). Go to step 3. 

We give the expense for Lan-DR(m,k) without reorthogonalization (see the next 
section for reorthogonalization expenses). The cost is m — k matrix- vector products 
per cycle and about 6(m — k) + (k + 2)m length n vector operations (such as dot 
products and daxpy's) per cycle. For k small relative to m, Lan-DR uses one matrix- 
vector product and roughly k + 8 vector ops per iteration. For k near m/2, there are 
about 2k + 10 vector ops per iteration. This compares to about 5 length n vector ops 
per iteration for CG. Of course, CG does not calculate eigenvectors like Lan-DR does. 

Example 1. We use a test matrix that has many small eigenvalues. It is a diagonal 
matrix of dimension n = 5000 whose diagonal elements are 0.1, 0.2, 0.3, . . . 9.8, 9.9, 10, 
11, 12, ... , 4909, 4910. The right-hand side is a random normal vector. We apply the 
method Lan-DR(100,40), which at each restart keeps the 40 Ritz vectors correspond- 
ing to the smallest Ritz values and then builds a subspace of dimension 100 (including 
the 40 approximate eigenvectors). With k-SO, at every iteration of all of the cycles 
except the first, there is a reorthogonalization against 40 vectors. We first look at how 
well Lan-DR computes the eigenvalues. Figure 3.1 shows the residual norms for the 
40 Ritz vectors. The desired number of eigenvalues is 30 and the desired residual tol- 
erance is 10~ 8 . It takes 57 cycles for the first 30 eigenvalues to reach this level. Since 
this is a fairly difficult problem with eigenvalues clustered together, it takes a while 
for eigenvalues to start converging. However, from that point, eigenvalues converge 
regularly, and it can be seen that many eigenvalues and their eigenvectors can be 
computed accurately. The orthogonalization costs are significantly less than for fully 
reorthogonalized IRAM (about 84 vector operations for orthogonalization per Lan- 
DR iteration versus an average of 280 for IRAM). For a matrix that is fairly sparse 
so that the matrix-vector product is inexpensive (and also with cheap preconditioner, 
if there is one), the difference in orthogonalization is significant. 

We continue the example by comparing Lan-DR(100,40) to unrestarted Lanczos. 
Figure 3.2 has the residual norms for the smallest and 30th eigenvalues with each 
method. The results are very similar for the first eigenvalue, in spite of the fact that 
Lan-DR is restarted. The presence of the approximate eigenvectors corresponding to 
the nearby eigenvalues essentially deflates them and thus gives good convergence for 
Lan-DR. For eigenvalue 30, Lan-DR trails unrestarted Lanczos, but is still compet- 
itive. This is significant, since Lan-DR(100,40) requires storage of only about 100 
vectors compared to nearly 3000 for unrestarted Lanczos. 

Next, we look at the solution of the linear equations. Figure 3.3 has Lan-DR 
with three choices of m and k and also has CG. The convergence of Lan-DR(100,40) 
is very close to that of CG. The deflation of eigenvalues in Lan-DR(100,40) allows it 
to compete with an unrestarted method such as CG. Lan-DR(100,10) is not too far 
behind CG, but Lan-DR(30,10) restarts too frequently and is much slower. 

4. Reorthogonalization. It was mentioned earlier that some reorthogonaliza- 
tion is necessary to control roundoff error. We now look at this in more detail and 
give several possible approaches. The first is full reorthogonalization which takes every 
Lanczos vector formed by the three-term recurrence and reorthogonalizes it against 
every previous vector. The expense for this in terms of vector operations of length 
n varies from about 2k to 2m per iteration. The second approach is periodic re- 
orthogonalization [2U]. For this, we always reorthogonalize the Vk+i and Vk+2 vectors, 
then at regular intervals reorthogonalize two consecutive vectors (see |68[ 166] for why 
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Number of Cycles 

Fig. 3.1. Computing many eigenvalues of a matrix with small eigenvalues. 




Matrix-Vector Products 



Fig. 3.2. Comparison of Lan-DR(100,40) with unrestarted Lanczos for first and 30th eigenvalues. 

consecutive vectors need to be reorthogonalized) . The cost for this varies as with full 
reorthogonalization when it is applied, but it saves considerably if the reorthogonaliza- 
tion is not needed frequently. Next is partial reorthogonalization (PRO) [59, 58, 68, 66J 
which monitors loss of orthogonality and thus determines when to reorthogonalize. 
As suggested in [55], we use "global orthogonalization" , so when we reorthogonal- 
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Fig. 3.3. Comparison of Lan-DR with CG for solving linear equations. 

ize, it is against all previous vectors. As with periodic, we always reorthogonalize 
the Vk+i and Vk+2 vectors and always reorthogonalize two consecutive vectors. This 
approach can be cheaper than periodic reorthogonalization, because it waits until 
reorthogonalization is needed. 

The next three reorthogonalization methods are related to the three above, but 
reorthogonalization is done only against the first k vectors. Here we are using the 
idea from selective reorthogonalization (SO) [47; that orthogonality is only lost in the 
direction of converged or converging Ritz vectors [41]|g6]. Since restarting is used, 
normally only the k Ritz vectors that are kept at the restart have an opportunity to 
converge. There can be exceptions as will be discussed in Examples 3 and 4 below. 
The first SO-type approach has been used in the previous section. It is called k- 
SO and has reorthogonalization at every iteration against the k Ritz vectors. This 
requires about 2k vector operations per iteration versus an average of about k + m 
vector operations for full reorthogonalization. The last two methods are k-periodic 
and k-PRO. These are the same as periodic and PRO, except they reorthogonalize 
only against the first k vectors. 

Example 2. We consider again Lan-DR for the matrix of Example 1. For this 
problem, loss of orthogonality is controlled by the restarting and the reorthogonal- 
ization of the Vk+i and vectors at the restart. No further reorthogonalization 
is needed. Table 4.1 show a comparison with full reorthogonalization. The second 
column gives the loss of orthogonality as measured by \\V^V m — I m xm\\ at the end 
of 57 cycles. The next two columns have the residual norms of the first and thirtieth 
Ritz pairs. While full reorthogonalization gives greater orthogonality of the Lanczos 
vectors, the Ritz vectors end up with similar accuracy. The thirtieth eigenvector con- 
tinues to converge beyond cycle 57 and eventually reaches residual norm of 3.7 x 10~ 12 
even with the approach of reorthogonalizing only at the restart. For this example, 
the Ritz vectors converge slowly enough that we don't have a Ritz vector appear and 
significantly converge in one cycle (see Figure 3.1). So before a eigenvector has con- 
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Table 4.1 

Full reorthogonalization vs. reorthogonalize only at the restart 





orthogonality 


rn 1 


rn 30 


k+1, k+2 vectors only 


2.2 x 1(T 12 


5.6 x 1CT 12 


9.9 x 1(T 9 


full reorthog. 


1.2 x 1CT 14 


6.7 x lO" 12 


9.9 x 1(T 9 



verged, an approximation to it is among the group of Ritz vectors that Vk+i and Vk+2 
are reorthogonalized against. This explains why reorthogonalizing at restarts turns 
out to be often enough. 

This example points out that restarting can make reorthogonalization easier. Re- 
orthogonalization against only 40 vectors is done for two vectors every 60 iterations. 
If we compare to unrestarted Lanczos using PRO with global reorthogonalization 
and with tolerance on loss of orthogonality of square root of machine epsilon, the 
number of vectors reorthogonalized is similar. However, as the unrestarted Lanczos 
iteration proceeds, there are many previous vectors to reorthogonalize against. Also 
unrestarted Lanczos with PRO gives converged eigevectors with residual norms of just 
below 10~ 6 compared to well below 10~ n for Lan-DR(100,40) with reorthogonaliza- 
tion only at the restart. We note however, that the PRO tolerance can be adjusted 
for more frequent reorthogonalization and greater accuracy. 

The next matrix is designed so that Lan-DR needs more reorthogonalization. We 
compare approaches and look at some potential problems. 

Example 3. Let the matrix be diagonal with dimension n — 5000 and diagonal 
elements 1,2,3,... 9, 10, 100, 101, 102, . . . , 5088, 5089. The right-hand side is a random 
normal vector. We use 12 cycles of Lan-DR(120,40) with the reorthogonalization 
approaches described at the beginning of this section. More reorthogonalization is 
needed than in Example 2, because eigenvectors converge quicker. Table 4.2 has the 
results. Two different tolerances on the loss of orthogonality for the PRO methods are 
used, square root of machine epsilon and three-quarters power. The second column 
of the table gives the frequency of reorthogonalization (of two vectors) for periodic 
methods. The next three columns are the same as columns in the previous table. The 
last column has another measure of the effect of the roundoff error. It gives the number 
of iterations needed to solve a second right-hand side using the deflated CG method 
which will be given in the next section. We see that k-periodic reorthogonalizing 
of two vectors every 40 iterations gives fairly good results. Accuracy drops as the 
reorthogonalization is done less frequently. With frequency of 75, Lan-DR is still able 
to compute some eigenvectors accurately, but unlike with frequency of 70, multiple 
copies of eigenvalues appear. Also, Lan-DR is not longer helpful for the solution of 
the second right-hand side. Partial reorthogonalization gives results somewhat similar 
to periodic restarting without the need to select the frequency ahead of time. For k- 
PRO with ejj 5 , a total of 82 vectors are reorthogonalized and with €q, there are 44 
reorthogonalizations. This compares to 50 for k-periodic with frequency of 40. 

Example 4- We continue with same matrix. Because it has fairly rapidly converg- 
ing eigenvalues, it can be used to demonstrate a possible problem with reorthogonal- 
izing against only the k saved Ritz vectors. Table 4.3 has results for Lan-DR with 
k-SO for k = 40 and changing values of m. Each test is run the number of cycles so 
that the total number of matrix-vector products is 1000 or just over. Even though 
k-SO reorthogonalizes at every iteration, there is increasing loss of orthogonality as 
m increases, until a high level of orthogonality comes back at m = 200. We explain 
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Table 4.2 
Compare reorthogonalization methods 





reor. freq. 


orthogonality 


rn 1 


rn 30 


2nd rhs it's 


full 


1 


1.3 x 10~ i4 


7.5 x 10~ i2 


1.7 x 10" iu 


57 


k-SO 


1 


1.8 x lO" 11 


7.5 x 10~ i2 


1.8 x 10" iu 


57 


k-pcriodic 


40 


6.7 x 10~ iU 


5.8 x 10~ i2 


1.2 x 10-* 


57 




DO 


1.7 X 10 


4.5 X 10 


6.6 X 10 


57 




70 


4.0 x 10~ 5 


5.9 x 10~ 12 


3.7 X 10" 5 


70 




75 


9.0 


4.6 x 10~ 12 




218 


periodic 


40 


3.1 x 10- 10 


5.8 x 10~ 12 


2.0 x 10" 9 


57 




70 


2.8 x 10~ b 


5.9 x 10~ 12 


2.0 x 10" b 


57 




75 


2.4 x 10~ 5 


4.6 x 10~ 12 


5.5 x 10" 5 


57 




80 


5.5 


5.5 x 10~ 12 


8.0 x 10" 3 


213 


k-PRO, e b 




8.4 x 10~ 9 


4.9 x 10~ 12 


1.5 x 10~ 7 


57 


k-PRO, e^ b 




1.0 x lO" 11 


4.9 x 10~ i2 


2.1 x 10~ iU 


57 


PRO, e Q b 




7.6 x 10~ 9 


4.9 x 10~ 12 


1.1 x 10" 7 


57 


PRO, e- (b 




4.8 x 10 _i2 


4.9 x 10~ 12 


1.7 x 10 _iU 


57 



Table 4.3 

An effect of rapidly converging eigenvectors. 





k-SO 




full reorthogonalization 




m , k 


orthogonality 


rn30 


orthogonality 


rn 30 


100, 40 


4.8 x 10~ 12 


1.2 x 10" iu 


1.2 x 10~ i4 


6.3 x lO" 11 


120, 40 


1.8 x 10" 11 


1.8 x 10" 1U 


7.5 x 10~ 12 


1.8 x 10" 10 


140, 40 


1.1 x 10~ x 


3.6 x 10-'' 


5.8 x 10~ i2 


1.2 x 10~ x 


160, 40 


1.3 x 10" 3 


6.3 x 10" 5 


4.8 x 10" 12 


6.6 x 10" 7 


180, 40 


1.0 


1.4 


5.9 x 10~ 12 


3.7 x 10- b 


200, 40 


1.2 x 10~ 12 


3.4 x 10" 11 


4.6 x 10~ 12 


3.4 x 10" 11 



first what happens when m = 140. Recall that this matrix has 10 well separated 
eigenvalues. Also recall we fully reorthogonalize in the first cycle. After that first 
cycle of Lan-DR(140,40) with k-SO, there are only seven Ritz values below 10. After 
the second cycle, all ten small eigenvalues have converged to a high degree. Some or- 
thogonality is lost in that cycle, because by the end of the cycle, there are converged 
Ritz vectors in the subspace that are not reorthogonized against. The effect is more 
extreme with m = 180, because then nine of the ten small eigenvalues converge part 
way in the first cycle while one eigenvalue is missing (the one at 7.0). That missing 
one converges very early in the second cycle and orthogonality is lost after that. So 
it is dangerous to use k-SO if some eigenvectors converge very rapidly (within one 
cycle). On the other hand, k-SO and the other k- methods have been successful for 
other problems, such as in Example 2 and for the large QCD matrices that are in 
later experiments. The next example shows another possible problem with k-SO. 

Example 5. For matrices with outstanding eigenvalues other than the small ones 
that can converge in one cycle, care must be taken. For k-SO to work, Ritz vectors 
corresponding to those eigenvalues need to be included in the Ritz vectors that are 
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saved at the restart. We use the same matrix as in the previous example, except the 
largest eigenvalue is changed from 5089 to 5400. This eigenvalue is now outstanding 
enough to converge rapidly. Lan-DR(120,40) with k-SO loses orthogonality of its 
basis. After 12 cycles, the orthogonality level is 3.4 x 10~ 3 , and it is actually worse 
(near 1) at earlier cycles. As mentioned, this can be fixed by including the large 
eigenpair among those selected to be saved for the next cycle. Another option is to 
use the reorthogonalization against all previous vectors instead of just the first k. 

5. Multiple Right-hand Sides. Next, solution of systems with multiple right- 
hand sides is considered. We suggest a simple approach that uses the eigenvectors 
generated during the solution of the first right-hand side to deflate eigenvalues from the 
solution of the second right-hand side. First a projection is done over the Ritz vectors 
at the end of Lan-DR for the first right-hand side. Then the standard conjugate 
gradient method is used. We call this approach deflated CG or D-CG, since it deflates 
out eigenvalues before applying CG. It is similar to the init-CG approach [16] that 
also does a projection before CG. However, init-CG projects over the entire Krylov 
subpace generated by CG while solving the first right-hand side, while D-CG uses a 
projection over a compact space that has the important eigen-information. 

D-CG 

1. After applying the initial guess xq, let the system of equations be A(x — xq) = 

7*0. 

2. If it is known that the right-hand sides are closely related, project over the 
previous computed solution vectors. 

3. Apply the Galerkin Projection for Vfe. This uses the Vk+i and Tk matrices 
from (|3.3p that were developed while solving the first right-hand side with 
Lan-DR. Specifically, solve Tkd = VjFro, where Tk is the diagonal k by k 
portion of Tk, and let the new approximate solution be x = Jo + Vkd and the 
new residual be r = ro — AVkd = ro — Vk+iTkd. 

4. Apply CG until satisfied with convergence. 

5.1. Examples. 

Example 6. We consider the same matrix as in Example 1 and solution of a 
second right-hand side. The first right-hand side system has been solved with Lan- 
DR(100,40). We first illustrate how increasing the accuracy of the approximate eigen- 
vectors helps the convergence of the second right hand side. Figure 5.1 has convergence 
curves for the second right-hand side with D-CG when Lan-DR has been run different 
numbers of cycles. CG is also given and it lags behind all the D-CG curves. The 
Lan-DR linear equations relative residual converges at 20 cycles. However, D-CG is 
not as effective as it can be if Lan-DR has only been run 20 cycles. The eigenvectors 
are not all accurate enough to deflate out the eigencomponents from the residual of 
the second right-hand side, and eventually CG has to deal with these components 
and this slows convergence. D-CG after 40 cycles converges rapidly and does not slow 
down as it procedes. Using 60 cycles of Lan-DR is only a little better. Figure 5.2 has 
the components in the directions of the eigenvectors corresponding to the 80 smallest 
eigenvalues of the residual vector for D-CG on the second right-hand side after the 
projection over the approximate eigenvectors that come fromt the first right-hand 
side. These components improve significantly as Lan-DR on the first is run more 
cycles. 

We next consider varying the number of approximate eigenvectors that are used 
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Fig. 5.1. Solving the first right-hand side to different levels of accuracy. 

for deflation. For the first right-hand side, Lan-DR(m,k) is run for 50 cycles with 
changing k and with m = k + 60. Figure 5.3 has the convergence results for then 
applying D-CG to the second right-hand side. With k = 10 eigenvectors, D-CG 
is already significantly better than regular CG, but deflating more eigenvalues is 
even better. For this example there is a significant jump upon going from 80 to 120 
eigenvectors. This happens because having 120 gets past the 100 clustered eigenvalues 
and pushes well into the rest of the spectrum. 

Figure 5.4 shows that Lan-DR/D-CG can come out ahead of regular CG in 
terms of matrix-vector products even if we spend more on Lan-DR for the first 
right-hand side. We use 10 right-hand sides. The first is solved with 44 cycles of 
Lan-DR(180j,120). Then the other nine use D-CG with relative residual tolerance of 
10~ 8 . This all takes about as many matrix-vector products as solving three systems 
with CG. 

5.2. Comparison with block-CG. Block-CG [40j EH [57] generates a Krylov 
subspace with each right-hand side as starting vector then combines them all together 
into one large subspace. This large subspace can generally develop approximations 
to the eigenvectors corresponding to the small eigenvalues, so block-CG naturally 
deflates eigenvalues as it goes along. As a result, block-CG can be very efficient in 
terms of matrix- vector products. 

Block methods require all the right-hand sides be available at the same time, while 
Lan-DR/D-CG only needs one right-hand side at a time. Block methods have extra 
orthogonalization expense compared to non-block methods. Simple block-CG can be 
unstable, particularly if the right-hand sides are related to each other. This can be 
controlled by the somewhat complicated process of removing ( "deflating" ) right-hand 
sides [39] . 

As mentioned, block-CG can converge quickly. For instance with the matrix from 
Example 1 and 20 random right-hand sides, block-CG needs only 3100 matrix-vector 
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Fig. 5.2. Eigencompents for residual for the second right-hand side after solving the first right- 
hand side to different levels of accuracy. 



products. Lan-DR(180,120)/D-CG uses 4909. The next example shows that Lan- 
DR/D-CG can be competitive with block-CG. 

Example 7. We use the same matrix as in Example 3, but with n = 10000 (so the 
largest element is 10089). We compare the Lan-DR/D-CG approach with block-CG 
for 20 random right-hand sides. Lan-DR has m = 100 and k = 15 and runs for four 
cycles. Figure 5.5 shows that the two approaches converge at almost the same number 
of matrix- vector products. However, Lan-DR has less orthogonalization expense. Lan- 
DR with k-periodic reorthogonalization of two vectors every 40 iterations followed by 
D-CG for the remaining 19 right-hand sides uses 14,000 vector operations of length 
n. Block-CG needs 177,000. 

5.3. Related right-hand sides. 

Example 8. We use the same matrix as in the previous example. There are again 
20 right-hand sides, but this time the first is chosen randomly and the others are chosen 
as &W = &( 1 )+10 _3 *ran^ , where ran^ is a random vector (elements chosen randomly 
with Normal(0,l) distribution. The convergence tolerance is moved to relative residual 
below 10~ 6 , because block-CG has instability after that point. Figure 5.6 shows 
that Lan-DR does a better job of taking advantage of the related right-hand sides. 
However, as mentioned earlier, block-CG can be improved by removing right-hand 
sides once they become linearly dependent. 

5.4. Example from QCD. Many problems in lattice quantum chromodynam- 
ics (lattice QCD) have large systems of equations with multiple right-hand sides. For 
example, the Wilson-Dirac formulation |19l [10] and overlap fermion [36l H] computa- 
tions both lead to such problems. Deflation of eigenvalues is used for QCD problems, 
for example in [TTl Q31 Q21 [371 EH (Ml [Ml QCD matrices have complex entries. 
They generally are non-Hermitian, but it may be possible to change into Hermitian 
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Fig. 5.4. Lan-DR and D-CG versus regular CG for 10 right-hand sides. 



form. This can be done by multiplying the system by 75 [19 or multiplying by the 
complex conjugate of the QCD matrix. We consider the first of these options, which 
gives an indefinite matrix. 

Example 9. We choose a large QCD matrix of size n = 1.5 million. The n value 
is set near to K-critical, which makes it a difficult problem. There are at least a dozen 
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right-hand sides for each matrix and sometimes over a hundred. The first right-hand 
side is solved using Lan-DR(m,k) with several values of k. The linear equations 
solution does not converge past a residual norm of 0.036. This shows Lan-DR may 
not be stable for an indefinite problem and helps motivate the development of Minres 
methods in the next section. However, Lan-DR does generate useful eigenvectors that 
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Fig. 5.7. Comparison of CG to D-CG with varying numbers of eigenvalues deflated for a large 
QCD matrix. 



can be used to solve the other right-hand sides. Figure 5.7 shows the convergence 
for solution of the second right-hand side system using D-CG with three choices 
of k. These are compared to CG. We note that deflating 20 eigenvalues gives a big 
improvement over regular CG. Using 150 eigenvectors is almost an order of magnitude 
improvement over CG. 

The results in Figure 5.7 are fairly typical. Figure 5.8 shows the even iterations 
only for five configurations (matrices). D-CG with k = 100 eigenvalues deflated is 
compared with CG. There is some variance in CG, but with 100 small eigenvalues 
taken out, the convergence of D-CG is almost identical for all matrices. 

6. Minres-DR and deflated Minres. We now consider symmetric/Hermitian 
indefinite problems. For stability, we need minimum residual [UJ E01 Hlj versions 
of our methods. Instead of Lan-DR, a method Minres-DR can be used. It solves 
the linear equations problem with a minimum residual projection and it computes 
harmonic Ritz vectors [27l \17\ [42J, [3U [66l [35] instead of regular Ritz vectors. Harmonic 
Ritz approximations are more reliable for interior eigenvalues. We next give the steps 
only that are changed from Lan-DR to Minres-DR. 

Minres-DR(m,k) 

3. Eigenvector computation. Compute the k smallest (or others, if desired) 
harmonic Ritz values, 9i, and let g{) be the corresponding vectors. See [35j 
for more, including residual norm formulas. 

4. Linear equations. Let c m+ \ — Vk+\r$. Solve the least squares problem 
min\\c m +i — T TO d|| for d and set x — xq + V m d. Then r = ro — Ax = 
ro — V m +iT m d. If satisfied with convergence of the linear equations and the 
eigenvalues, can stop. If not, let the new xo = x and ro = r and continue. 

5. Restart. Let P be the m + 1 by k + 1 matrix whose first k columns come from 
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orthonormalizing the gi vectors (and adding zeros for the m + 1 row). Let 
e m be the mth coordinate vector of length m. Then the k + 1 column of P 
is the vector [— < m +i, m T'~ T e m , 1] T [15] orthonormalizcd against the previous 
columns. The new V and T matrices are formed from the old ones: V^ff = 
V m +iP and T k = P T T m P m j ; , where P m ,k has the first m columns and k 
rows of P. Set V k+1 = Vgff and T k = T^ W . 

For the second and subsequent right-hand sides, D-Minres (deflated Minres) can 
be used. Like D-CG, it has a projection over the approximate eigenvectors, but this 
is followed by Minres [13]. 

Example 10. For an indefinite matrix, we use a diagonal matrix of dimension 
n = 1000 whos diagonal entries are generated with random numbers distributed Nor- 
mal(0,l) that are shifted 2.0 to the right. Then there are 22 negatives among the 
1000 eigenvalues. The five closest to the origin are -0.015, -0.033, 0.041, -0.53, and 
-0.57. Indefinite problems can be difficult if there are very many eigenvalues on both 
sides of the origin, unless the eigenvalues are well separated from the origin. Fig- 
ure 6.1 has a comparison of the Minres methods with the Galerkin methods Lan-DR 
and D-CG for solution of three right-hand sides. For the second and third right-hand 
sides, D-Minres is used for both tests because the Matlab CG function found instabity 
and would not proceed. We see that the results are fairly similar, except Minrcs-DR 
converges more smoothly than Lan-DR. It seems that stability concerns are the main 
reason to use the Minres versions. 

Example 11. The Minres methods are next applied to the first large QCD matrix 
from Example 9. Figure 6.2 shows Minres-DR(200,60) as diamonds at the end of 
each cycle. Minres-DR converges almost as fast as standard unrestarted Minres. 
We also see that D-Minres gives greater improvement as the number of approximate 
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Fig. 6.1. Minres-DR vs. Lan-DR for the first rhs with D-Minres for the 2nd and 3rd rhs'. 




- » <h 



Minres-DR(200,60) 
- diamonds 




1000 1500 2000 

Matrix-vector products 



Fig. 6.2. Minres-DR and D-Minres for the large QCD matrix. 



eigenvectors is increased. Finally, D-Minres converges a little faster than D-CG does in 
Example 9. For instance, following Lan-DR(200,150), D-CG takes 348 matrix-vector 
products. Meanwhile, D-Minres needs only 310 when following Minres-DR(200,150). 



7. Conclusion. Lan-DR is a restarted Lanczos method that solves a symmet- 
ric/Hermitian positive definite system of linear equations and simultaneously com- 
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putes eigenvalues and eigenvectors. The restarting allows the computation of eigen- 
vectors when there are limits on storage. The presence of the approximate eigenvectors 
in the subspace helps the convergence. In fact, the convergence is often close to that 
of unrestarted Lanczos for both the linear equations and the eigenvalues, in spite of 
the restarting. 

There are several options for reorthogonalizing. Included arc methods that only 
reorthogonalize against the k saved Ritz vectors. For these methods, there may be 
trouble if there are rapidly converging eigenvalues that converge in one cycle. Restart- 
ing generally keeps reorthogonalization costs down. 

For subsequent right-hand sides, deflated CG first uses a projection over the 
eigenvectors that were computed by Lan-DR while solving the first right-hand side 
system, then applies CG. For difficult problems, the convergence of CG can be much 
faster after the small eigenvalues are deflated out. Experiments on large problems 
from QCD back this up. 

Minres-DR is a version of Lan-DR for indefinite problems. For indefinte systems 
with multiple right-hand sides, a deflated Minres method is given. 
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